behavior space
- North America > United States > California (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
Soft Quality-Diversity Optimization
Hedayatian, Saeed, Nikolaidis, Stefanos
Quality-Diversity (QD) algorithms constitute a branch of optimization that is concerned with discovering a diverse and high-quality set of solutions to an optimization problem. Current QD methods commonly maintain diversity by dividing the behavior space into discrete regions, ensuring that solutions are distributed across different parts of the space. The QD problem is then solved by searching for the best solution in each region. This approach to QD optimization poses challenges in large solution spaces, where storing many solutions is impractical, and in high-dimensional behavior spaces, where discretization becomes ineffective due to the curse of dimensionality. We present an alternative framing of the QD problem, called \emph{Soft QD}, that sidesteps the need for discretizations. We validate this formulation by demonstrating its desirable properties, such as monotonicity, and by relating its limiting behavior to the widely used QD Score metric. Furthermore, we leverage it to derive a novel differentiable QD algorithm, \emph{Soft QD Using Approximated Diversity (SQUAD)}, and demonstrate empirically that it is competitive with current state of the art methods on standard benchmarks while offering better scalability to higher dimensional problems.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Austria > Vienna (0.14)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (21 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Vector Quantized-Elites: Unsupervised and Problem-Agnostic Quality-Diversity Optimization
Tsakonas, Constantinos, Chatzilygeroudis, Konstantinos
Quality-Diversity algorithms have transformed optimization by prioritizing the discovery of diverse, high-performing solutions over a single optimal result. However, traditional Quality-Diversity methods, such as MAP-Elites, rely heavily on predefined behavior descriptors and complete prior knowledge of the task to define the behavior space grid, limiting their flexibility and applicability. In this work, we introduce Vector Quantized-Elites (VQ-Elites), a novel Quality-Diversity algorithm that autonomously constructs a structured behavior space grid using unsupervised learning, eliminating the need for prior task-specific knowledge. At the core of VQ-Elites is the integration of Vector Quantized Variational Autoencoders, which enables the dynamic learning of behavior descriptors and the generation of a structured, rather than unstructured, behavior space grid -- a significant advancement over existing unsupervised Quality-Diversity approaches. This design establishes VQ-Elites as a flexible, robust, and task-agnostic optimization framework. To further enhance the performance of unsupervised Quality-Diversity algorithms, we introduce behavior space bounding and cooperation mechanisms, which significantly improve convergence and performance, as well as the Effective Diversity Ratio and Coverage Diversity Score, two novel metrics that quantify the actual diversity in the unsupervised setting. We validate VQ-Elites on robotic arm pose-reaching, mobile robot space-covering, and MiniGrid exploration tasks. The results demonstrate its ability to efficiently generate diverse, high-quality solutions, emphasizing its adaptability, scalability, robustness to hyperparameters, and potential to extend Quality-Diversity optimization to complex, previously inaccessible domains.
- North America > Canada (0.04)
- Europe > Greece > West Greece > Patra (0.04)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- North America > United States > California (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
Complex System Exploration with Interactive Human Guidance
Morel, Bastien, Moulin-Frier, Clément, Barla, Pascal
The diversity of patterns that emerge from complex systems motivates their use for scientific or artistic purposes. When exploring these systems, the challenges faced are the size of the parameter space and the strongly non-linear mapping between parameters and emerging patterns. In addition, artists and scientists who explore complex systems do so with an expectation of particular patterns. Taking these expectations into account adds a new set of challenges, which the exploration process must address. We provide design choices and their implementation to address these challenges; enabling the maximization of the diversity of patterns discovered in the user's region of interest -- which we call the constrained diversity -- in a sample-efficient manner. The region of interest is expressed in the form of explicit constraints. These constraints are formulated by the user in a system-agnostic way, and their addition enables interactive system exploration leading to constrained diversity, while maintaining global diversity.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning
Osika, Zuzanna, Zatarain-Salazar, Jazmin, Oliehoek, Frans A., Murukannaiah, Pradeep K.
Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives. An MORL agent must make decisions based on the diverse signals provided by distinct reward functions. Training an MORL agent yields a set of solutions (policies), each presenting distinct trade-offs among the objectives (expected returns). MORL enhances explainability by enabling fine-grained comparisons of policies in the solution set based on their trade-offs as opposed to having a single policy. However, the solution set is typically large and multi-dimensional, where each policy (e.g., a neural network) is represented by its objective values. We propose an approach for clustering the solution set generated by MORL. By considering both policy behavior and objective values, our clustering method can reveal the relationship between policy behaviors and regions in the objective space. This approach can enable decision makers (DMs) to identify overarching trends and insights in the solution set rather than examining each policy individually. We tested our method in four multi-objective environments and found it outperformed traditional k-medoids clustering. Additionally, we include a case study that demonstrates its real-world application.
- North America > United States > New Jersey > Essex County > Newark (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Macao (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Quality with Just Enough Diversity in Evolutionary Policy Search
Templier, Paul, Grillotti, Luca, Rachelson, Emmanuel, Wilson, Dennis G., Cully, Antoine
Evolution Strategies (ES) are effective gradient-free optimization methods that can be competitive with gradient-based approaches for policy search. ES only rely on the total episodic scores of solutions in their population, from which they estimate fitness gradients for their update with no access to true gradient information. However this makes them sensitive to deceptive fitness landscapes, and they tend to only explore one way to solve a problem. Quality-Diversity methods such as MAP-Elites introduced additional information with behavior descriptors (BD) to return a population of diverse solutions, which helps exploration but leads to a large part of the evaluation budget not being focused on finding the best performing solution. Here we show that behavior information can also be leveraged to find the best policy by identifying promising search areas which can then be efficiently explored with ES. We introduce the framework of Quality with Just Enough Diversity (JEDi) which learns the relationship between behavior and fitness to focus evaluations on solutions that matter. When trying to reach higher fitness values, JEDi outperforms both QD and ES methods on hard exploration tasks like mazes and on complex control problems with large policies.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
Diversity from Human Feedback
Wang, Ren-Jian, Xue, Ke, Wang, Yutong, Yang, Peng, Fu, Haobo, Fu, Qiang, Qian, Chao
Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.
- Asia > China > Jiangsu Province > Nanjing (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (13 more...)